Skip to content

Add PostgreSQL observability telemetry exposure#1808

Merged
mploski merged 12 commits intofeature/database-controllersfrom
postgres-operator-monitoring
Apr 17, 2026
Merged

Add PostgreSQL observability telemetry exposure#1808
mploski merged 12 commits intofeature/database-controllersfrom
postgres-operator-monitoring

Conversation

@DmytroPI-dev
Copy link
Copy Markdown

@DmytroPI-dev DmytroPI-dev commented Apr 1, 2026

Description

Adds PostgreSQL observability telemetry for PostgresCluster using Prometheus pod-annotation-based scraping. Metrics are exposed by CNPG's built-in exporters on PostgreSQL pods (port 9187) and PgBouncer pooler pods (port 9127). The operator controls whether annotations are injected via class- and cluster-level configuration, with no dedicated metrics Service or ServiceMonitor required for PostgreSQL or PgBouncer scraping.

A ServiceMonitor is still supported for operator-controller metrics as an optional step.

Key Changes

api/v4/postgresclusterclass_types.go
Added class-level observability configuration (monitoring.postgresqlMetrics.enabled, monitoring.connectionPoolerMetrics.enabled) that controls whether scrape annotations are injected into CNPG pods.

api/v4/postgrescluster_types.go
Added cluster-level disable-only overrides (spec.monitoring.postgresqlMetrics.disabled, spec.monitoring.connectionPoolerMetrics.disabled) allowing per-cluster opt-out without changing the class.

pkg/postgresql/cluster/core/cluster.go
Wired observability flag resolution into PostgresCluster reconciliation. When enabled, sets InheritedMetadata.Annotations on the CNPG Cluster (for PostgreSQL pods) and Template.ObjectMeta.Annotations on CNPG Pooler resources (for PgBouncer pods).

pkg/postgresql/cluster/core/monitoring.go
Added isPostgreSQLMetricsEnabled / isConnectionPoolerMetricsEnabled flag resolution helpers.
Added buildPostgresScrapeAnnotations / buildPoolerScrapeAnnotations annotation builders.
Added removeScrapeAnnotations for the disable path.

pkg/postgresql/cluster/core/monitoring_unit_test.go
Added unit tests for flag resolution, scrape annotation builders, and annotation removal.

internal/controller/postgrescluster_controller_test.go
Added integration tests verifying that InheritedMetadata annotations are set on the CNPG Cluster when monitoring is enabled and removed when disabled by cluster override.

docs/PostgreSQLObservabilityDashboard.json
Reference Grafana dashboard covering PostgreSQL target count, RW/RO PgBouncer availability, WAL activity, database sizes, PgBouncer client load, controller reconcile metrics, and domain fleet metrics.

docs/postgresSQLMonitoring-e2e.md
End-to-end validation guide for the annotation-based scraping flow on KIND.

Testing and Verification

Added unit tests in pkg/postgresql/cluster/core/monitoring_unit_test.go for:

  • class/cluster observability enablement logic
  • scrape annotation builders for PostgreSQL (port 9187) and PgBouncer (port 9127)
  • annotation removal on the disable path

Added integration tests in internal/controller/postgrescluster_controller_test.go verifying:

  • InheritedMetadata.Annotations presence when monitoring is enabled
  • annotation removal when disabled by cluster-level override

Related Issues

CPI-1853 — related JIRA ticket.

Grafana screenshot:

Screenshot 2026-04-15 at 14 33 08

PR Checklist

  • Code changes adhere to the project's coding standards.
  • Relevant unit and integration tests are included.
  • Documentation has been updated accordingly.
  • All tests pass locally.
  • The PR description follows the project's guidelines.

@DmytroPI-dev DmytroPI-dev force-pushed the postgres-operator-monitoring branch from a1b796f to 976ecd1 Compare April 2, 2026 14:08
@DmytroPI-dev DmytroPI-dev changed the title Create ServiceMonitor and basic Grafana dashboard for metrics Add PostgreSQL observability telemetry exposure via ServiceMonitors Apr 2, 2026
Comment thread docs/PostgreSQLObservabilityDashboard.md Outdated
Comment thread pkg/postgresql/cluster/core/cluster.go
Comment thread pkg/postgresql/cluster/core/cluster.go Outdated
Comment thread docs/PostgreSQLObservabilityDashboard.md
Comment thread pkg/postgresql/cluster/core/monitoring.go Outdated
Comment thread pkg/postgresql/cluster/core/cluster.go Outdated
Comment thread pkg/postgresql/cluster/core/cluster.go Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 10, 2026

CLA Assistant Lite bot:
Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contribution License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment with the exact sentence copied from below.


I have read the CLA Document and I hereby sign the CLA


1 out of 3 committers have signed the CLA.
@DmytroPI-dev
@limak9182
@mploski
You can retrigger this bot by commenting recheck in this Pull Request

Comment thread api/v4/postgrescluster_types.go Outdated
Comment thread api/v4/postgrescluster_types.go Outdated
Comment thread api/v4/postgresclusterclass_types.go Outdated
Comment thread pkg/postgresql/cluster/core/monitoring.go Outdated
Comment thread pkg/postgresql/cluster/core/monitoring.go Outdated
Comment thread pkg/postgresql/cluster/core/monitoring.go Outdated
Comment thread pkg/postgresql/cluster/core/monitoring.go Outdated
@DmytroPI-dev DmytroPI-dev force-pushed the postgres-operator-monitoring branch from d710f58 to 63b5937 Compare April 13, 2026 09:54
@DmytroPI-dev DmytroPI-dev changed the title Add PostgreSQL observability telemetry exposure via ServiceMonitors Add PostgreSQL observability telemetry exposure Apr 15, 2026
@DmytroPI-dev DmytroPI-dev force-pushed the postgres-operator-monitoring branch from 988138d to 08dfa16 Compare April 15, 2026 12:59
@DmytroPI-dev DmytroPI-dev marked this pull request as ready for review April 15, 2026 18:52
Comment thread api/v4/postgrescluster_types.go Outdated
Comment thread docs/postgresSQLMonitoring-e2e.md Outdated
Comment thread docs/postgresSQLMonitoring-otel-e2e.md Outdated
Comment thread docs/postgresSQLMonitoring-otel-e2e.md Outdated
Comment thread docs/postgresSQLMonitoring-otel-e2e.md Outdated
Comment thread internal/controller/postgrescluster_controller.go Outdated
Comment thread internal/controller/postgrescluster_controller_test.go Outdated
Comment thread internal/controller/postgresdatabase_controller.go Outdated
Comment thread pkg/postgresql/cluster/core/cluster.go Outdated
Comment thread pkg/postgresql/cluster/core/cluster.go
Comment thread pkg/postgresql/cluster/core/cluster.go Outdated
oldConditions := make([]metav1.Condition, len(postgresCluster.Status.Conditions))
copy(oldConditions, postgresCluster.Status.Conditions)

if err := reconcilePostgreSQLMetricsService(ctx, c, rc.Scheme, postgresCluster, postgresMetricsEnabled); err != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that could potentially be packed into unit of work style of code block, as It's very repeatable across the logic statements, It could get packed into and executable interface which handles the componentMetrics or something in this line of thought.
The code would then get separated into testable blocks and orchestrated cleanly.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

units, nice!

Comment thread pkg/postgresql/cluster/core/monitoring.go Outdated
return ctrl.Result{}, errors.Join(err, statusErr)
}

postgresMetricsEnabled := isPostgreSQLMetricsEnabled(postgresCluster, clusterClass)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an idea, could that be merged into a general port like "Are the component displaying metrics == enabled"? Because there is always a possibility to expand then without adding isComponent1MetricsEnabled, isComponentNMetricsEnabled. Just the method, getComponentMetricsSettings(...) return map[string(component)]bool. One config poll for any future coming.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets do this as a part of refactor initiative. Not sure if this should be separated port or not as this is tightly coupled to low level cluster provider though

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as the cluster provider is still orchestrated/managed by k8s I would say It's "portable". Yeah, let's analyse It along the way later.

}

func normalizeCNPGClusterSpec(spec cnpgv1.ClusterSpec, customDefinedParameters map[string]string) normalizedCNPGClusterSpec {
normalized := normalizedCNPGClusterSpec{
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could potentially map It via json contract. Unless we have tags busy in our specs, which we probably have. If not, It would be mapped straight into cnpg spec.
Btw. wha do ingeritedAnnotations mean tech wise?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inherited annotations set a anottations on the k8s pod. Thanks do this we have a way to discover every pod with those annotations and this is what otel collector use to find pod endpoints to scrape from

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you

assert.Equal(t, postgresMetricsPortString, cluster.Spec.InheritedMetadata.Annotations[prometheusPortAnnotation])
}

func TestClusterSecretExists(t *testing.T) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Isn't It more readable to split the units into chunks?
a naming idea, just for reference. TestClusterSecret_with_n_expected_rwro_poolers_exist

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided to follow table driven tests for tests that use the same methods but have different input/output expectations

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that It might be the case. Backing off of the nit, all fine!

@mploski mploski force-pushed the postgres-operator-monitoring branch from e3f97e8 to 6638532 Compare April 16, 2026 20:36
@mploski mploski force-pushed the postgres-operator-monitoring branch from 6638532 to 3b83475 Compare April 16, 2026 20:43
@mploski mploski merged commit bf07316 into feature/database-controllers Apr 17, 2026
12 of 15 checks passed
@mploski mploski deleted the postgres-operator-monitoring branch April 17, 2026 07:09
@github-actions github-actions bot locked and limited conversation to collaborators Apr 17, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants